Near Optimal Work-Stealing Tree Scheduler for Highly Irregular Data-Parallel Workloads

نویسندگان

  • Aleksandar Prokopec
  • Martin Odersky
چکیده

We present a work-stealing algorithm for runtime scheduling of dataparallel operations in the context of shared-memory architectures on data sets with highly-irregular workloads that are not known a priori to the scheduler. This scheduler can parallelize loops and operations expressible with a parallel reduce or a parallel scan. The scheduler is based on the work-stealing tree data structure, which allows workers to decide on the work division in a lock-free, workloaddriven manner and attempts to minimize the amount of communication between them. A significant effort is given to showing that the algorithm has the least possible amount of overhead. We provide an extensive experimental evaluation, comparing the advantages and shortcomings of different data-parallel schedulers in order to combine their strengths. We show specific workload distribution patterns appearing in practice for which different schedulers yield suboptimal speedup, explaining their drawbacks and demonstrating how the work-stealing tree scheduler overcomes them. We thus justify our design decisions experimentally, but also provide a theoretical background for our claims.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Achieving Efficient Work-Stealing for Data-Parallel Collections

In modern programming high-level data-structures are an important foundation for most applications. With the rise of the multicore era, there is a growing trend of supporting data-parallel collection operations in general purpose programming languages and platforms. To facilitate object-oriented reuse these operations are highly parametric, incurring abstraction performance penalties. Furthermo...

متن کامل

On Lock-Free Work-stealing Iterators for Parallel Data Structures

In modern programming high-level data-structures are an important foundation for most applications. With the rise of multicores, there is a trend of supporting data-parallel collection operations in general purpose programming languages. These operations are highly parametric, incurring abstraction performance penalties. Furthermore, data-parallel operations must scale when applied to irregular...

متن کامل

Efficient Work Stealing for Portability of Nested Parallelism and Composability of Multithreaded Program

We present performance evaluations of parallel-for loop with work stealing technique. The parallel-for by work stealing transforms the parallel-loop into a form of binary tree by making use of method of divide-and-conquer. Iterations are distributed in the leaves procedures of the binary tree, and the parallel executions are performed by stealing subtrees from the bottom of the tree. The work s...

متن کامل

Greedy Sharing: Load Balancing on Weakly Consistent Memory

An efficient online scheduler is crucial for balancing irregular parallel computations in a multiprocessor system. Over the last two decades, variants of the work-stealing scheduler have emerged as a popular choice for hardware shared-memory systems. The state-of-the-art work-stealing algorithms can guarantee near-optimal asymptotic complexity by relying on simple yet powerful techniques to bal...

متن کامل

History-Based Adaptive Work Distribution

Exploiting parallelism of increasingly heterogeneous parallel architectures is challenging due to the complexity of parallelism management. To achieve high performance portability whilst preserving high productivity, high-level approaches to parallel programming delegate parallelism management, such as partitioning and work distribution, to the compiler and the run-time system. Random work stea...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013